Automatic retrieval and presentation of information relevant to the context of a user&#39;s conversation

ABSTRACT

Methods, apparatus and computer-code for electronically retrieving and presenting information are disclosed herein. In some embodiments, information is retrieved and presented in accordance with at least one feature of electronic media content of a multi-party conversation. Optionally, the multi-party conversation is a video conversation and at least one feature is a video content feature. Exemplary features include but are not limited to speech delivery features, key word features, topic features, background sound or image features, deviation features and biometric features.

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims the benefit of U.S. Provisional PatentApplication No. 60/821,272 filed Aug. 2, 2006 by the present inventors,and U.S. Provisional Patent Application No. 60/824,323 filed Sep. 1,2006 by the present inventors

FIELD OF THE INVENTION

The present invention relates to techniques for information retrievaland presentation.

BACKGROUND AND RELATED ART

Knowledge bases contain enormous amounts of information on any topicimaginable. To tap this information, however, users need to explicitlyissue a search request. The explicit search process requires the userto:

(i) realize that he needs a specific piece of information;

(ii) select the information source(s)

(iii) formulate a query expression and execute it against theinformation.

The following published patent applications provide potentially relevantbackground material: US 2006/0167747; US 2003/0195801; US 2006/0188855;US 2002/0062481; and US 2005/0234779. All references cited herein areincorporated by reference in their entirety. Citation of a referencedoes not constitute an admission that the reference is prior art.

SUMMARY

The present inventors are now disclosing a technique wherein amulti-party voice conversation is monitored (i.e. by monitoringelectronic media content of the multi-party voice conversation), and inaccordance with at least one feature of the electronic media content,information is retrieved and presented to at least one conversationparty of the multi-party voice conversation.

Exemplary information sources from which information is retrievedinclude but are not limited to search engines, news services, images orvideo banks, RSS feeds, and blogs. The information source may be local(for example, the local file system of a desktop computer or PDA) and/ormay be remote (for example, a remote “Internet” search engine accessiblevia the Internet).

Not wishing to be bound by any theory, it is noted that by monitoringthe multi-party voice conversation that is not directed to an entitydoing the monitoring, an “implicit information retrieval request” may beformulated, thereby relieving the user of any requirement to explicitlyformulate an information retrieval request and direct that informationretrieval request to an information retrieval service.

Furthermore, the present inventors are now disclosing that the nature ofthe information retrieval and/or presentation of the retrievedinformation may be adapted to certain detectable features of theconversation and/or features of the conversation participants.

In one example, a demographic profile of a given user may be generated(i.e. either from detectable features of the conversation and/or otherinformation sources). Thus, in one particular example, two individualsare speaking to each other in English (for example, using a “Skype”connection, or on cell phones), but one of the individuals has a Spanishaccent. According to this example, the individual with the Spanishaccent may be presented with retrieved Spanish-language information (forexample, from a Spanish-language newswire retrieved using “keywords”translated from the English language conversation).

In another example related to retrieval and/or presentation ofinformation in accordance with a demographic profile, two users arespeaking about applying to law-school. One speaker is younger (say lessthan 25 years old) and another speaker is over 40. The “age demographic”of the speakers is detected from electronic media content of themulti-party conversation, and the older user may be served an articleabout law-school essay strategies for older law-school applicants, whilethe younger user may be served a profile from a dating website forcollege-aged students interested in dating a pre-law major.

If, for example, one user is speaking on a cell phone in Boston and theuser is speaking on a cell phone in Florida, the Boston-based user maybe provided information about New England law schools while theFlorida-based user may be provided information about Florida lawschools. This is an example of retrieving information according to alocation of a participant in a multi-party conversation.

In another example related to retrieval and/or presentation ofinformation in accordance with a demographic profile, a man and womanmay be speaking about movies, and the “gender demographic” is detected.The man may be served information (for example, movie starting times)about movies popular with men (for example, horror movies, actionmovies, etc) while the woman may be served information about moviespopular with women (for example, romance women). If the man is locatedon the “north side of town” and the woman on the “south side of town,”the man may be provided information about movie start times on the“north side” while the woman is provide information about movie starttimes on the “south side.”

In another example, information may be retrieved and/or presented inaccordance with an emotion of one or more conversation-participants. Forexample, if it is detected that a person is angry, a link toanger-management material may be presented. In a similar example, if itis detected that a person is angry, a link to a clip of relaxing musicmay be presented.

In another example related to emotion-based information retrieval, iftwo people are speaking about a given rock-and-roll band, links to clipsof the band's music may be presented. In one variation, certain songs ofthe rock-and-roll band may be pre-categorized as “happy songs” or “sadsongs.” If one or both of the conversation-participants are detected as“happy” (for example, according to key words, body language, and/orvoice tones), then links to clips of “happy songs” are presented.

In another example, information may be retrieved and/or presented inaccordance with a “conversation participants relation.” Thus, if it isdetermined or assessed that two conversation participants are spouses orlovers, when they speak about the particular rock-and-roll band, linksto clips to “love songs” from this band are presented to the users.Alternatively, if it is determined that two conversation participantsare not friends or lovers but only business acquaintances, the “mostpopular” songs from the band may be presented to the users instead, andthe “romantic” songs may be filtered out.

In another example, information may be retrieved and/or presented inaccordance with a physiological status feature of the user. In thisexample, if a user coughs often during the conversation, a link to aWikipedia article or a Medscape article about the flu may be presentedto the user.

In another example, information may be retrieved and/or presented inaccordance with one or more personality traits personality-profilefeature of the user. According to one particular example, if an“extroverted” or “people-oriented” person would, when discussing acertain city with a friend, receive information about “people-oriented”activities that are done in groups. Conversely, an “introverted” personmay receive information about activities done in solitude.

It is now disclosed for the first time a method of providinginformation-retrieval services. The method includes the steps of: a)monitoring a multi-party voice conversation not directed at the entitydoing the monitoring; and b) in accordance with content of the monitoredvoice conversation, retrieving and presenting information to at leastone party of the multi-party voice conversation.

Non-limiting examples of information items that may be retrieved includebut are not limited to i) a social-network profile; ii) a weatherforecast; iii) a traffic forecast; iv) a Wikipedia entry; v) a newsarticle; vi) an online forum entry; vii) a blog entry; viii) a socialbookmarking web service entry; ix) a music clip; and x) a film clip.

According to some embodiments, the retrieving includes assigning akeyword weight in accordance with an demographic parameter of a givenparty of the multi-party voice conversation estimated from electronicmedia of the multi-party conversation, the estimated demographicparameter being selected from the group consisting of: i) an ageparameter; ii) a gender parameter; and iii) an ethnicity parameter.

According to some embodiments, the retrieving includes selecting oremphasizing an information-source from a plurality of candidateinformation-sources in accordance with an demographic parameter of agiven party of the multi-party voice conversation estimated fromelectronic media of the multi-party conversation, the estimateddemographic parameter being selected from the group consisting of: i) anage parameter; ii) a gender parameter; and iii) an ethnicity parameter.

According to some embodiments, the retrieving includes effecting adisambiguation in accordance with an demographic parameter of a givenparty of the multi-party voice conversation estimated from electronicmedia of the multi-party conversation, the estimated demographicparameter being selected from the group consisting of: i) an ageparameter; ii) a gender parameter; and iii) an ethnicity parameter.

According to some embodiments, the assigning includes assigning akeyword weight in accordance with a speech delivery feature of a givenparty of the multi-party voice conversation estimated from electronicmedia of the multi-party conversation the speech delivery feature beingselected from the group consisting of: i) a loudness parameter; ii) aspeech tempo parameter; and iii) an emotional outburst parameter.

According to some embodiments, the retrieving includes selecting oremphasizing an information-source from a plurality of candidateinformation-sources in accordance with a geographic location of a givenparty of the multi-party voice conversation estimated from electronicmedia of the multi-party conversation.

According to some embodiments, the retrieving includes selecting oremphasizing an information-source from a plurality of candidateinformation-sources in accordance with an accent feature of at least onegiven party of the multi-party voice conversation.

According to some embodiments, the retrieving includes assigning akeyword weight in accordance with an demographic parameter of a givenparty of the multi-party voice conversation estimated from electronicmedia of the multi-party conversation, the estimated demographicparameter being selected from the group consisting of: i) an ageparameter; ii) a gender parameter; and iii) an ethnicity parameter.

According to some embodiments, the information-presenting for a firstset of words extracted from the multi-party conversation includesdisplacing earlier-presented retrieved information associated with asecond earlier set of words extracted from the multi-party conversationin accordance with relative speech delivery parameters of the first andsecond set extracted words in accordance with a speech delivery featurebeing selected from the group consisting of: i) a loudness parameter;ii) a speech tempo parameter; and iii) an emotional outburst parameter.

According to some embodiments, the multi-party voice conversation iscarried out between a plurality of client terminal devices communicatingvia a wide-area network, and for a given client device of the clientdevice plurality: i) the information retrieval is carried out forincoming content relative to the given client device; and ii) theinformation presenting is on a display screen of the given clientdevice.

It is now disclosed for the first time a method of providinginformation-retrieval services, the method comprising: a) monitoring aterminal device for incoming media content and outgoing media content ofa multi-party conversation; and b) in accordance with the incoming mediacontent, retrieving information over a remote network and presenting theretrieved information on the monitored-terminal device.

According to some embodiments, the retrieving includes sending contentof the multi-party conversation to an Internet search engine, and thepresenting includes presenting search results from the Internet searchengine.

According to some embodiments, the retrieving includes retrieving atleast one of: i) a social-network profile; ii) a weather forecast; iii)a traffic forecast; iv) a Wikipedia entry; v) a news article; vi) anonline forum entry; vii) a blog entry; viii) a social bookmarking webservice entry; ix) a music clip; and x) a film clip.

It is now disclosed for the first time a method of providinginformation-retrieval services, the method comprising: a) monitoring agiven terminal client device for an incoming or outgoing remote call;and b) upon detecting a the incoming or outgoing remote call, sendingcontent of the detected incoming call or outgoing call over a wide-areanetwork to a search engine; and c) presenting search results from thesearch engine on the monitored terminal device.

A Discussion of Various Features of Electronic Media Content

According to some embodiments, the at least one feature of theelectronic media content includes at least one speech delivery featurei.e. describing how a given set of words is delivered by a givenspeaker. Exemplary speech delivery features include but are not limitedto: accent features (i.e. which may be indicative, for example, ofwhether or not a person is a native speaker and/or an ethnic origin),speech tempo features (i.e. which may be indicative of a mood oremotional state), voice pitch features (i.e. which may be indicative,for example, of an age of a speaker), voice loudness features, voiceinflection features (i.e. which may indicative of a mood including butnot limited to angry, confused, excited, joking, sad, sarcastic,serious, etc) and an emotional outburst feature (defined here as apresence of laughing and/or crying).

In another example, a speaker speaks some sentences or words loudly, orin an excited state, while other sentences or words are spoken morequietly. According to this example, when retrieving and/or presentinginformation, different words are given a different “weight” accordanceto an assigned importance, and words or phrases spoken “loudly” or in an“excited stated” are given a higher weight than words or phrases spokenquietly.

In some embodiments, the multi-party conversation is a videoconversation, and the at least one feature of the electronic mediacontent includes a video content feature.

Exemplary video content features include but are not limited to:

i) visible physical characteristic of a person in an image—including butnot limited to indications of a size of a person and/or a person'sweight and/or a person's height and/or eye color and/or hair colorand/or complexion;

ii) feature of objects or person's in the ‘background’—i.e. backgroundobject other than a given speaker—for example, including but not limitedto room furnishing features and a number of people in the roomsimultaneously with the speaker;

iii) a detected physical movement feature—for example, a body-movementfeature including but not limited to a feature indicative of handgestures or other gestures associated with speaking.

According to some embodiments, the at least one feature of theelectronic media content includes at least one key words featuresindicative of a presence and/or absence of key words or key phrases inthe spoken content and the information search and/or retrieval iscarried out in accordance with the at least one key word feature.

In one example, the key words feature is determined by using aspeech-to-text converter for extracting text. The extracted text is thenanalyzed for the presence of key words or phrases. Alternatively oradditionally, the electronic media content may be compared with soundclips that include the key words or phrases.

According to some embodiments, the at least one feature of theelectronic media content includes at least one topic categoryfeature—for example, a feature indicative if a topic of a conversationor portion thereof matches one or more topic categories selected from aplurality of topic categories for example, including but not limited tosports (i.e. a conversation related to sports), romance (i.e. a romanticconversation), business (i.e. a business conversation), current events,etc.

According to some embodiments, the at least one feature of theelectronic media content includes at least one topic change feature.Exemplary topic change features include but are not limited to a topicchange frequency, an impending topic change likelihood, an estimatedtime until a next topic change, and a time since a previous topicchange.

Thus in one example, retrieved information is displayed to a user, andwhen the conversation topic changes, previously-displayed informationassociated with a ‘previous topic’ is either removed from the userdisplay and replaced with newer information, or is “scrolled down” ordisplayed less prominently. The rate at which new information (i.e. inaccordance with newer topic of the conversation) replaces olderinformation can be adjusted in accordance with a number of factors, forexample, the personality of one or more users (for example, withimpulsive users, displayed retrieved information is replaced faster), anemotion associated with one or more words, and other factors.

In some embodiments, the at least one feature of the electronic mediacontent includes at least one feature ‘demographic property’ indicativeof and/or derived from at least one demographic property or estimateddemographic property (for example, age, gender, etc) of a personinvolved in the multi-party conversation (for example, a speaker). Forexample, two users who are over the age of 30 who speak about “Madonna”may be served a link to music clips from Madonna's song in the 1980s,while teenagers may be served a link to a music clip of one of Madonna'smore recently released song.

On the other hand, two users with a demographic profile of “devoutcatholic” may be served an image of the blessed virgin Mary.

Exemplary demographic property features include but are not limited togender features (for example, related to voice pitch or from hair lengthor any other gender features), educational level features (for example,related to spoken vocabulary words used), household income feature (forexample, related to educational level features and/or key words relatedto expenditures and/or images of room furnishings), a weight feature(for example, related to overweight/underweight—e.g. related to size inan image or breathing rate where obese individuals or more likely tobreath at a faster rate), age features (for example, related to an imageof a balding head or gray hair and/or vocabulary choice and/or voicepitch), ethnicity (for example, related to skin color and/or accentand/or vocabulary choice). Another feature that, in some embodiments,may indicate a person's demography is the use (or lack of usage) ofcertain expressions, including but not limited to profanity. Forexample, people from certain regions or age groups may be more likely touse profanity (or a certain type), while those from other regions or agegroups may be less likely to use profanity (or a certain type).

Not wishing to be bound by theory, it is noted that there are somesituations where it is possible to perform ‘on the fly demographicprofiling’ (i.e. obtaining demographic features derived from the mediacontent) obviating the need, for example, for ‘explicitly provided’demographic data for example, from questionnaires or purchaseddemographic data. This may allow, for example, targeting of moreappropriate or more pertinent information.

Demographic property features may be derived from audio and/or videofeatures and/or word content features. Exemplary features from whichdemographic property features may be derived from include but are notlimited to: idiom features (for example, certain ethnic groups or peoplefrom certain regions of the United States may tend to use certainidioms), accent features, grammar compliance features (for example, morehighly educated people are less likely to make grammatical errors), andsentence length features (for example, more highly educated people aremore likely to use longer or more ‘complicated features’).

In one example related to “educational level,” people associated withthe more highly educated demographic group are more likely to be servedcontent or links to content from the “New York Times” (i.e. apublication with more “complicated” writing and vocabulary”) while a“less educated user” is served content or links to content from the “NewYork Post” (i.e. a publication with more “complicated” writing andvocabulary”)

In some embodiments, the at least one feature of the electronic mediacontent includes at least one ‘physiological feature’ indicative ofand/or derived from at least one physiological property or estimateddemographic property (for example, age, gender, etc) of a personinvolved in the multi-party conversation (for example, a speaker)—i.e.as derived from the electronic media content of the multi-partyconversation.

Exemplary physiological parameters include but are not limited tobreathing parameters (for example, breathing rate or changes inbreathing rate), a sweat parameters (for example, indicative if asubject is sweating or how much—this may be determined, for example, byanalyzing a ‘shininess’ of a subject's skin, a coughing parameter (i.e.a presence or absence of coughing, a loudness or rate of coughing, aregular or irregularity of patterns of coughing), a voice-hoarsenessparameter, and a body-twitching parameter (for example, twitching of theentire body due to, for example, chills, or twitching of a given bodypart—for example, twitching of an eyebrow).

In one example, if the user is “excited” when speaking certain keywords, this could cause the user to be served information where the keywords spoken when excited are given extra “weight” in any informationsearch or retrieval or display.

In another example, a person may twitch a body part when nervous orlying. If it is assessed that a user or speaker is “lying” this couldalso influence search results.

In some embodiments, the at least one feature of the electronic mediacontent includes at least one feature ‘background item feature’indicative of and/or derived from background sounds and/or a backgroundimage. It is noted that the background sounds may be transmitted alongwith the voice of the conversation, and thus may be included within theelectronic media content of the conversation.

In one example, if a dog is barking in the background and this isdetected, news article about recently-passed local ordinances regulatingdog-ownership may be displayed.

The background sound may be determined or identified, for example, bycomparing the electronic media content of the conversation with one ormore sound clips that include the sound it is desired to detect. Thesesound clips may thus serve as a ‘template.’

In another example, if a certain furniture item (for example, an‘expensive’ furniture item) is detected in the background of a videoconversation, an item (i.e. good or service) appropriate for the‘upscale’ income group may be provided.

If it is determined that a user is affluent, then when the user mentions“boat” information about yachts may be displayed to the use. Conversely,a less-affluent user that discusses boats in a conversation may beprovided information related to ferry cruises or fishing.

In yet another example, if an image of a crucifix is detected in thebackground of a video conversation, a news article about the Pope may beprovided, or a link to a Catholic blog may be provided.

In some embodiments, the at least one feature of the electronic mediacontent includes at least one feature temporal and/or spatiallocalization feature indicative of and/or derived from a specificlocation or time. Thus, in one example, when a Philadelphia-located user(for example, having a phone number in the 215 area code) discussed“sports” he/she is served sports stories (for example, from a newswire)about a recent Phillies or Eagles game, while a Baltimore-located user(for example, having a phone number in the 301 area code) is servedsports stories about a recent Orioles or Ravens game.

This localization feature may be determined from the electronic media ofthe multi-party conversation.

Alternatively or additionally, this localization feature may bedetermined from data from an external source for example, a GPS and/ormobile phone triangulation.

Another example of an ‘external source’ for localization information isa dialed telephone number. For example, certain area codes or exchangesmay be associated (but not always) with certain physical locations.

In some embodiments, the at least one feature of the electronic mediacontent includes at least one ‘historical feature’ indicative ofelectronic media content of a previous multi-party conversation and/oran earlier time period in the conversation—for example, electronic mediacontent who age is at least, for example, 5 minutes, or 30 minutes, orone hour, or 12 hours, or one day, or several times, or a week, orseveral weeks.

In some embodiments, the at least one feature of the electronic mediacontent includes at least one ‘deviation feature.’ Exemplary deviationfeatures of the electronic media content of the multi-party conversationinclude but are not limited to:

a) historical deviation features—i.e. a feature of a given subject orperson that changes temporally so that a given time, the behavior of thefeature differs from its previously-observed behavior. Thus, in oneexample, a certain subject or individual usually speaks slowly, and at alater time, this behavior ‘deviates’ when the subject or individualspeaks quickly. In another example, a typically soft-spoken individualspeaks with a louder voice.

In another example, an individual who 3 months ago was observed (e.g.via electronic media content) to be of average or above-average weightis obese. This individual may be served a Wikipedia link aboutweight-loss. In contrast, a user who is consistently obese may not beserved the link in order not to “annoy” the user.

In another example, a person who is normally polite may become angry andrude—this may an example of ‘user behavior features.’

b) inter-subject deviation features—for example, a ‘well-educated’person associated with a group of lesser educated persons (for example,speaking together in the same multi-party conversation), or a‘loud-spoken’ person associated with a group of ‘soft-spoken’ persons,or ‘Southern-accented’ person associated with a group of persons withBoston accents, etc. If distinct conversations are recorded, thenhistorical deviation features associated with a single conversation arereferred to as intra-conversation deviation features, while historicaldeviation features associated with distinct conversations are referredto as inter-conversation deviation features.

c) voice-property deviation features—for example, an accent deviationfeature, a voice pitch deviation feature, a voice loudness deviationfeature, and/or a speech rate deviation feature. This may related touser-group deviation features as well as historical deviation features.

d) physiological deviation features—for example, breathing ratedeviation features, weight deviation features—this may related touser-group deviation features as well as historical deviation features.

e) vocabulary or word-choice deviation features—for example, profanitydeviation features indicating use of profanity—this may related touser-group deviation features as well as historical deviation features.

f) person-versus-physical-location—for example, a person with a Southernaccent whose location is determined to be in a Northern city (e.g.Boston) might be provided with a hotel coupon

In some embodiments, the at least one feature of the electronic mediacontent includes at least one ‘person-recognition feature.’ This may beuseful, for example, for providing pertinent retrieved informationtargeted for a specific person. Thus, in one example, theperson-recognition feature allows access to a database ofperson-specific data where the person-recognition feature functions, atleast in part, as a ‘key’ of the database. In one example, the ‘data’may be previously-provided data about the person, for example,demographic data or other data, that is provided in any manner, forexample, derived from electronic media of a previous conversation, or inany other manner.

In some embodiments, this may obviate the need for users to explicitlyprovide account information and/or to log in order to receive‘personalized’ retrieved information. Thus, in one example, the usersimply uses the service, and the user's voice is recognized from avoice-print. Once the system recognizes the specific user, it ispossible to present retrieved information in accordance withpreviously-stored data describing preferences of the specific user.

Exemplary ‘person-recognition’ features include but are not limited tobiometric features (for example, voice-print or facial features) orother person visual appearance features, for example, the presence orabsence of a specific article of clothing.

It is noted that the possibility of recognizing a person via a‘person-recognition’ feature does not rule out the possibility of usingmore ‘conventional’ techniques—for example, logins, passwords, PINs,etc.

In some embodiments, the at least one feature of the electronic mediacontent includes at least one ‘person-influence feature.’ Thus, it isrecognized that during certain conversations, certain individuals mayhave more influence than others—for example, in a conversation between aboss and an employee, the boss may have more influence and may functionas a so-called gatekeeper. For example, if one party of the conversationmakes a certain statement, and this statement appears to influence oneor more other parties of the conversation, the ‘influencing statement’may be assigned more importance. For example, if party ‘A’ says ‘weshould spend more money on clothes’ and party ‘B’ responds by saying ‘Iagree’ this could imbue party A's statement with additional importance,because it was an ‘influential statement.’

In one example, a user has several conversations in one day. The firstconversation is with an “influential person” who may be “important” forexample, a client/boss to whom the user of a device shows deference.When the conversation with the “important” person begins, previoussearch results may be cleared from a display screen or scrolled down,and replaced with search results that relate to the conversation with“important” person. Subsequently, the user may speak with a less“influential” person—for example, a child. In this example, during thesecond subsequent conversation, previously-display retrieved information(for example, retrieved in accordance with the first conversation) isnot replaced with information retrieved from the second conversation.

In some embodiments, the retrieval and/or presentation of informationincludes presenting information to a first individual (for example,person ‘A’) in accordance with one or more feature of media content froma second individual different from the first individual (for example,person ‘B’).

Apparatus for Retrieving Information

Some embodiments of the present invention provide apparatus forretrieving and presenting information. The apparatus may be operative toimplement any method or any step of any method disclosed herein. Theapparatus may be implemented using any combination of software and/orhardware.

The data storage may be implemented using any combination of volatileand/or non-volatile memory, and may reside in a single device or resideon a plurality devices either locally or over a wide area.

The aforementioned apparatus may be provided as a single client device(for example, as a handset or laptop or desktop configured to presentretrieved information in accordance with the electronic media content).In this example, the ‘data storage’ is volatile and/or non-volatilememory of the client device—for example, where outgoing and incomingcontent is digitally stored in the client device or a peripheral storagedevice of the client device.

Alternatively or additionally, the apparatus may be distributed on aplurality of devices for example with a ‘client-server’ architecture.

These and further embodiments will be apparent from the detaileddescription and examples that follow.

BRIEF DESCRIPTION OF THE DRAWINGS

While the invention is described herein by way of example for severalembodiments and illustrative drawings, those skilled in the art willrecognize that the invention is not limited to the embodiments ordrawings described. It should be understood that the drawings anddetailed description thereto are not intended to limit the invention tothe particular form disclosed, but on the contrary, the invention is tocover all modifications, equivalents and alternatives falling within thespirit and scope of the present invention. As used throughout thisapplication, the word “may” is used in a permissive sense (i.e., meaning“having the potential to’), rather than the mandatory sense (i.e.meaning “must”).

FIGS. 1A-1C describe exemplary use scenarios.

FIG. 2A-2D, 4, 5A-5C provide flow charts of exemplary techniques forlocating, retrieving and/or presenting information related to electronicmedia content of a multi-party conversation.

FIG. 3 describes an exemplary technique for computing one or morefeatures of electronic media content including voice content.

FIG. 6 provides a block diagram of an exemplary system for retrievingand presenting information in according with some embodiments of thepresent invention.

FIG. 7 describes an exemplary system for providing electronic mediacontent of a multi-party conversation.

FIGS. 8-14 describes exemplary systems for computing various features.

DETAILED DESCRIPTION OF EMBODIMENTS

The present invention will now be described in terms of specific,example embodiments. It is to be understood that the invention is notlimited to the example embodiments disclosed. It should also beunderstood that not every feature of the presently disclosed apparatus,device and computer-readable code for information retrieval andpresentation is necessary to implement the invention as claimed in anyparticular one of the appended claims. Various elements and features ofdevices are described to fully enable the invention. It should also beunderstood that throughout this disclosure, where a process or method isshown or described, the steps of the method may be performed in anyorder or simultaneously, unless it is clear from the context that onestep depends on another being performed first.

Embodiments of the present invention relate to a technique forretrieving and displaying information in accordance with the contextand/or content of voice content including but not limited to voicecontent transmitted over a telecommunications network in the context ofa multiparty conversation.

Certain examples of related to this technique are now explained in termsof exemplary use scenarios. After presentation of the use scenarios,various embodiments of the present invention will be described withreference to flow-charts and block diagrams. It is noted that the usescenarios relate to the specific case where the retrieved information ispresented ‘visually’ by the client device. In other examples, theinformation may be presented by audio means—for example, before, duringor following a call or conversation.

Also, it is noted that the present use scenarios and many other examplesrelate to the case where the multi-party conversation is transmitted viaa telecommunications network (e.g. circuit switched and/or packetswitched). In other example, two or more people are conversing ‘in thesame room’ and the conversation is recorded by a single microphones orplurality of microphones (and optionally one or more cameras) deployed‘locally’ without any need for transmitting content of the conversationvia a telecommunications network.

Use Scenario 1 Example of FIG. 1A

According to this scenario, a first user (i.e. ‘party 1’) of a “carphone” (i.e. a mobile phone mounted in a car, for example, in proximityof an onboard navigator system) a second user (i.e. ‘party 2’) usingVOIP software residing on the desktop, such as Skype® software.

In this example, at time t=t1, retrieved information is served to party1 in accordance with content of the conversation. In the example of FIG.1A, when Party 2 mentions “Millard Fillmore,” information about MillardFillmore (for example, from a search engine or Wikipedia article) isretrieved and displayed on a client device associated with “party1”—either the “small screen” of “Party 1”'s car-mounted cellphone or the“larger screen” of party 1's onboard navigator device.

It is noted that in the Example of FIG. 1A, there is no need for “Party1” to provide any search query whatsoever—a conversation is monitoredthat is not directed to the entity doing the monitoring but rather, thewords of Party 1 are directed exclusively to otherco-conversationalist(s)—in this case Party 2, and the words of Party 2are directed exclusively to Party 1.

In the example of FIG. 1A, Party 2 “knows” that Party 1 is driving andcannot key in a search query, for example a standard Internet searchengine. Thus, when party 1 unexpectedly knows extensive informationabout Millard Fillmore (i.e. a rather exotic topic), Party 1 succeeds insurprising party 1.

It is noted the decision to search on “Millard Fillmore” rather than“school” may be made using natural language processing techniques—forexample, language-model based techniques discussed below.

Use Scenario 2 Example of FIG. 1B

In this example, party 1 is located in Cleveland and party 2 is locatedin Boston. Party 2 driving in a region of the city where a building wasrecently on fire. They are discussing the building fire. In the exampleof FIG. 1B, after the word “fire” is meant, a news story about a fire isdisplayed on a screen of a user 1. The fire is not a major fire, and atthe time, a number of small fires are being handled in different citiesthroughout the United States. Thus, a certain amount of “disambiguation”is required in order to serve information about the “correct fire.”

In the example of FIG. 1B, it is possible to detect the location ofparty 2 (i.e. Boston) (for example, using a phone number or othertechnique) and to serve the “correct” local news story to the device ofparty 1.

Use Scenario 3 Example of FIG. 1C

In this example, party 1 proposes going go a Yankees game. Party 2 doesnot mention anything specific about the Yankees. Nevertheless,information about the Yankees (for example, an article about the historyof the Yankees, or a news story about their latest game) is retrievedand served to the client terminal device of party 2. This is one exampleof information being retrieved and served (i.e. to the “cellphone” ofparty 2) in accordance with “incoming” (i.e. incoming to the “cellphone”client terminal device of Party 2) electronic media content of themulti-party conversation.

SOME BRIEF DEFINITIONS

As used herein, ‘providing’ of media or media content includes one ormore of the following: (i) receiving the media content (for example, ata server cluster comprising at least one cluster, for example, operativeto analyze the media content and/or at a proxy); (ii) sending the mediacontent; (iii) generating the media content (for example, carried out ata client device such as a cell phone and/or PC); (iv) intercepting; and(v) handling media content, for example, on the client device, on aproxy or server.

As used herein, a ‘multi-party’ voice conversation includes two or moreparties, for example, where each party communicated using a respectiveclient device including but not limited to desktop, laptop, cell-phone,and personal digital assistant (PDA).

In one example, the electronic media content from the multi-partyconversation is provided from a single client device (for example, asingle cell phone or desktop). In another example, the media from themulti-party conversation includes content from different client devices.Similarly, in one example, the media electronic media content from themulti-party conversation is from a single speaker or a single user.Alternatively, in another example, the media electronic media contentfrom the multi-party conversation is from multiple speakers.The electronic media content may be provided as streaming content. Forexample, streaming audio (and optionally video) content may beintercepted, for example, as transmitted a telecommunications network(for example, a packet switched or circuit switched network). Thus, insome embodiments, the conversation is monitored on an ongoing basisduring a certain time period.Alternatively or additionally, the electronic media content ispre-stored content, for example, stored in any combination of volatileand non-volatile memory.

As used herein, ‘presenting of retrieved information in accordance witha least one feature’ includes one or more of the following:

i) configuring a client device (i.e. a screen of a client device) todisplay the retrieved information such that display of the client devicedisplays the retrieved information in accordance with the feature ofmedia content. This configuring may be accomplished, for example, bydisplaying the retrieved information using an email client and/or a webbrowser and/or any other client residing on the client device;

ii) sending or directing or targeting an the retrieved information to aclient device in accordance with the feature of the media content (forexample, from a client to a server, via an email message, an SMS or anyother method);

DETAILED DESCRIPTION OF BLOCK DIAGRAMS AND FLOW CHARTS

FIG. 2A refers to an exemplary technique for retrieving and presentinginformation in accordance with content of a multi-party conversation.

In step S109, electronic digital media content including spoken or voicecontent (e.g. of a multi-party audio conversation) is provided—e.g.received and/or intercepted and/or handled.

In step S111, one or more aspects of electronic voice content (forexample, content of multi-party audio conversation are analyzed), orcontext features are computed. In one example, the words of theconversation are extracted from the voice conversation and the words areanalyzed, for example, for a presence of key phrases.

In another example, discussed further below, an accent of one or moreparties to the conversation is detected. If, for example, one party hasa ‘Texas accent’ then this increases a likelihood that the party willreceive (for example, on her terminal such as a cellphone or desktop)information from a Texas-based online newspaper or magazine.

In another example, the multi-party conversation is a ‘videoconversation’ (i.e. voice plus video). In a particular example, if aconversation participant is wearing, for example, a hat or jacketassociated with a certain sports team (for example, a particularbaseball team), and if that sports team is scheduled an “away game” in adifferent city, a local weather forecast or traffic forecast associatedwith the game may be presented either to the “fan” or to aco-conversationalist (for example, using a different client terminaldevice) who could then “impress” the “fan” with his knowledge.

In step S113, one or more operations are carried out to retrieve andpresent information in accordance with results of the analysis of stepS111.

The information may be retrieved from any source, including but notlimited to online search engines, news services (for example, newswiresor “news sites” like www.cnn.com or www.nytimes.com), images or videobanks, RSS feeds, weather or traffic forecasts, Youtube® clips, sportsstatistics, Diugg, social editing sites, music banks, shopping sightssuch as Amazon, Deel.icio.us and blogs. The information source may belocal (for example, the local file system of a desktop computer or PDA)and/or may be remote (for example, a remote “Internet” search engineaccessible via the Internet).

Although advertisement information may be served together with theretrieved information, in many examples, the retrieved informationincludes information other than advertisement such as: Wikipediaentries, entries from social networks (such as dating sites, myspace,LinkedIn, etc), news articles, blohs, video or audio clips, or justabout any form of information.

FIG. 2B presents a flow-chart of a technique where outgoing and/orincoming content is monitored S411, and in accordance with the content,information is retrieved and presented S415. One example of how this isaccomplished in accordance with “incoming content” was discussed withreference to FIG. 1C.

FIG. 2C provides a flow-chart wherein a terminal device is monitoredS411 for an incoming and/or outgoing call with another client terminaldevice. In the event that an incoming and/or outgoing call or a“connection” is detected S415, information is retrieved in accordancewith incoming and/or outgoing content of the multi-party conversationand presented.

It is known that a conversation can “flow” and in many conversations,multiple topics are discussed. FIG. 2D provides a flow chart of anexemplary technique where: (i) a first information retrieval andpresentation is carried out in accordance with a first “batch” ofcontent or words (S411 and S415); and (ii) when the topic changes oranother event occurs S425 (for example, a speaker gets excited aboutsomething, raises his or her voice, looks up, repeats a phrase, etc—forexample, beyond some threshold), information may be retrieved andpresented (i.e. by displacing the previously-retrieved information fromthe first batch of electronic media content) in accordance with content8429 of a “second batch” of content or words.

In one example, the “earlier” information may be scrolled down.Alternatively or additionally, a “link” or interface element “pointing”to most recent content may be re-configured to, upon user invocation,provide the retrieved information for the “second batch” of contentrather than the “first batch” of content, after, for example, the topichas changed and/or the user or conversation-participant has indicated aparticular emotion or body language, etc.

Obtaining a Demographic Profile of a Conversation Participant from Audioand/or Video Data Relating to a Multi-Party Voice and Optionally VideoConversation (with reference to FIG. 3)

FIG. 3 provides exemplary types of features that are computed orassessed S111 when analyzing the electronic media content. Thesefeatures include but are not limited to speech delivery features S151,video features 8155, conversation topic parameters or features S159, keyword(s) feature S161, demographic parameters or features S163, health orphysiological parameters of features S167, background features S169,localization parameters or features S175, influence features S175,history features S179, and deviation features S183.

Thus, in some embodiments, by analyzing and/or monitoring a multi-partyconversation (i.e. voice and optionally video), it is possible to assess(i.e. determine and/or estimate) S163 if a conversation participant is amember of a certain demographic group from a current conversation and/orhistorical conversations. This information may then be used to moreeffectively retrieve and present “pertinent” information to the userand/or an associate of the user.

Relevant demographic groups include but are not limited to: (i) age;(ii) gender; (iii) educational level; (iv) household income; (v) ethnicgroup and/or national origin; (vi) medical condition.

(i) age/(ii) gender—in some embodiments, the age of a conversationparticipant is determined in accordance with a number of features,including but not limited to one or more of the following: speechcontent features and speech delivery features.

-   -   A) Speech content features—after converting voice content into        text, the text may be analyzed for the presence of certain words        or phrases. This may be predicated, for example, on the        assumption that teenagers use certain slang or idioms unlikely        to be used by older members of the population (and vice-versa).    -   B) Speech delivery features—in one example, one or more speech        delivery features such as the voice pitch or speech rate (for        example, measured in words/minute) of a child and/or adolescent        may be different than and speech delivery features of an young        adult or elderly person.

The skilled artisan is referred to, for example, US 20050286705,incorporated herein by reference in its entirety, which providesexamples of certain techniques for extracting certain voicecharacteristics (e.g. language/dialect/accent, age group, gender).

In one example related to video conversations, the user's physicalappearance can also be indicative of a user's age and/or gender. Forexample, gray hair may indicate an older person, facial hair mayindicate a male, etc.

Once an age or gender of a conversation participant is assessed, it ispossible to target retrieved information to the participant (or anassociated thereof) accordingly.

(ii) educational level—in general, more educated people (i.e. collegeeducated people) tend to use a different set of vocabulary words thanless educated people.

Information retrieval and/or presentation can be customized using thisdemographic parameter as well. For example, if it assumed that aconversationalist is college educated people, the n

(iv) ethnic group and/or national origin—this feature also may beassessed or determined using one or more of speech content features andspeech delivery features.

(v) number of children per household this may be observable frombackground ‘voices’ or noise or from a background image.

In one example, if background noise indicative of a present of childrenis detected in the background (for example, for voice pitch or a babycrying), then “child-oriented” content (for example, a link to a SesameStreet clip) or “parent-oriented content” (for example, an article fromParenting magazine online).

Thus, in one example, if two people are discussing movies, each on arespective cell phone, and a baby crying is detected in the backgroundfor the first “cell phone” then the first user may be served an articleabout popular movies for young children.

If the conversation then shifts to the topic of vacations, and a dogbarking is detected in the background for the second “cell phone” thenthe second user on the second cell phone may be served an article aboutpopular “pet-friendly” vacation destinations.

One example of ‘speech content features’ includes slang or idioms thattend to be used by a particular ethnic group or non-native Englishspeakers whose mother tongue is a specific language (or who come from acertain area of the world).

One example of ‘speech delivery features’ relates to a speaker's accent.The skilled artisan is referred, for example, to US 2004/0096050,incorporated herein by reference in its entirety, and to US2006/0067508, incorporated herein by reference in its entirety.

(vi) medical condition—In some embodiments, a user's medical condition(either temporary or chronic) may be assessed in accordance with one ormore audio and/or video features.

In one example, breathing sounds may be analyzed, and breathing rate maybe determined. This may be indicative of whether or not a person hassome sort of respiratory ailment, and data from a medical database couldbe presented to the user.

Alternatively, breathing sounds may determine user emotions and/or userinterest in a topic.

Storing Biometric Data (for Example, Voice-Print Data) and DemographicData (with Reference to FIG. 4)

Sometimes it may be convenient to store data about previousconversations and to associate this data with user account information.Thus, the system may determine from a first conversation (or set ofconversations) specific data about a given user with a certain level ofcertainty.

Later, when the user engages in a second multi-party conversation, itmay be advantageous to access the earlier-stored demographic data inorder to provide to the user pertinent information. Thus, there is noneed for the system to re-profile the given user.

In another example, the earlier demographic profile may be refined in alater conversation by gathering more ‘input data points.’

In some embodiments, the user may be averse to giving ‘accountinformation’—for example, because there is a desire not to inconveniencethe user.

Nevertheless, it may be advantageous to maintain a ‘voice print’database which would allow identifying a given user from his or her‘voice print.’

Recognizing an identity of a user from a voice print is known in theart—the skilled artisan is referred to, for example, US 2006/0188076; US2005/0131706; US 2003/0125944; and US 2002/0152078 each of which isincorporated herein by reference in entirety

Thus, in step S211 content (i.e. voice content and optionally videocontent) if a multi-party conversation is analyzed and one or morebiometric parameters or features (for example, voice print or face‘print’) are computed. The results of the analysis and optionallydemographic data are stored and are associated with a user identityand/or voice print data.

During a second conversation, the identity of the user is determinedand/or the user is associated with the previous conversation using voiceprint data based on analysis of voice and/or video content S215. At thispoint, the previous demographic information of the user is available.

Optionally, the demographic profile is refined by analyzing the secondconversation.

Techniques for Retrieving and/or Presenting Information in Accordancewith a Multi-Party Conversation

FIG. 5A provides a flow chart of an exemplary technique for retrievingand providing information. In the example of FIG. 5A, certain words aregiven “weights” in the information retrieval according to one or morefeatures of a conversation participant. For example, if it is determinedthat a given conversation-participant is “dominant” in the conversation(i.e. either from a personality profile or from the interaction betweenconversation-participants), words spoken by this participant may begiven a greater weight in information retrieval or search.

In another example, words spoken excitedly and/or with certain bodylanguage may be given greater weight.

FIG. 5B relates to a technique where a term disambiguation S309 may becarried out in accordance with one or more features of a conversationparticipant. For example, if it assessed that a person is an avidinvestor or computer enthusiast, then the word “apple” may be handled byretrieving information related to Apple Computer.

Another example relates to the word Madonna—this could refer either tothe “Virgin Mary” to a singer. If it is assessed that a conversationparticipant is an avid catholic, it is more likely the former. If it isassessed that a conversation participant is likes pop-music (forexample, from background sounds, age demographics, slang, etc), thenMadonna is more likely to refer to the singer.

In the exemplary technique of FIG. 5C, words are given greater “weight”or priority in accordance with body language and/or speech deliveryfeatures.

Discussion of Exemplary Apparatus

FIG. 6 provides a block diagram of an exemplary system 100 for retrievaland presentation of information in according with some embodiments ofthe present invention. The apparatus or system, or any component thereofmay reside on any location within a computer network (or single computerdevice) i.e. on the client terminal device 10, on a server or cluster ofservers (not shown), proxy, gateway, etc. Any component may beimplemented using any combination of hardware (for example, non-volatilememory, volatile memory, CPUs, computer devices, etc) and/orsoftware—for example, coded in any language including but not limited tomachine language, assembler, C, C++, Java, C#, Perl etc.

The exemplary system 100 may an input 110 for receiving one or moredigitized audio and/or visual waveforms, a speech recognition engine 154(for converting a live or recorded speech signal to a sequence ofwords), one or more feature extractor(s) 118, a historical data storage142, and a historical data storage updating engine 150.

Exemplary implementations of each of the aforementioned components aredescribed below.

It is appreciated that not every component in FIG. 6 (or any othercomponent described in any figure or in the text of the presentdisclosure) must be present in every embodiment. Any element in FIG. 6,and any element described in the present disclosure may be implementedas any combination of software and/or hardware. Furthermore, any elementin FIG. 6 and any element described in the present disclosure may beeither reside on or within a single computer device, or be a distributedover a plurality of devices in a local or wide-area network.

Audio and/or Video Input 110

In some embodiments, the media input 110 for receiving a digitizedwaveform is a streaming input. This may be useful for ‘eavesdropping’ ona multi-party conversation in substantially real time. In someembodiments, ‘substantially real time’ refers to refer time with no morethan a predetermined time delay, for example, a delay of at most 15seconds, or at most 1 minute, or at most 5 minutes, or at most 30minutes, or at most 60 minutes.

FIG. 7, a multi-party conversation is conducted using client devices orcommunication terminals 10 (i.e. N terminals, where N is greater than orequal to two) via the Internet 2. In one example, VOIP software such asSkype® software resides on each terminal 10.

In one example, ‘streaming media input’ 110 may reside as a ‘distributedcomponent’ where an input for each party of the multi-party conversationresides on a respective client device 10. Alternatively or additionally,streaming media signal input 110 may reside at least in part ‘in thecloud’ (for example, at one or more servers deployed over wide-areaand/or publicly accessible network such as the Internet 20). Thus,according to this implementation, and audio streaming signals and/orvideo streaming signals of the conversation (and optionally videosignals) may be intercepted as they are transmitted over the Internet.

In yet another example, input 110 does not necessarily receive or handlea streaming signal. In one example, stored digital audio and/or videowaveforms may be provided stored in non-volatile memory (including butnot limited to flash, magnetic and optical media) or in volatile memory.

It is also noted, with reference to FIG. 7, that the multipartyconversation is not required to be a VOIP conversation. In yet anotherexample, two or more parties are speaking to each other in the sameroom, and this conversation is recorded (for example, using a singlemicrophone, or more than one microphone). In this example, the system100 may include a ‘voice-print’ identifier (not shown) for determiningan identity of a speaking party (or for distinguishing between speech ofmore than one person).

In yet another example, at least one communication device is a cellulartelephone communicating over a cellular network.

In yet another example, two or more parties may converse over a‘traditional’ circuit-switched phone network, and the audio sounds maybe streamed to information retrieval and presentation system 100 and/orprovided as recording digital media stored in volatile and/ornon-volatile memory.

Feature Extractor(s) 118

FIG. 8 provides a block diagram of several exemplary featureextractor(s)—this is not intended as comprehensive but just to describea few feature extractor(s). These include: text feature extractor(s) 210for computing one or more features of the words extracted by speechrecognition engine 154 (i.e. features of the words spoken); speechdelivery features extractor(s) 220 for determining features of how wordsare spoken; speaker visual appearance feature extractor(s) 230 (i.e.provided in some embodiments where video as well as audio signals areanalyzed); and background features (i.e. relating to background soundsor noises and/or background images).

It is noted that the feature extractors may employ any technique forfeature extraction of media content known in the art, including but notlimited to heuristically techniques and/or ‘statistical AI’ and/or ‘datamining techniques’ and/or ‘machine learning techniques’ where a trainingset is first provided to a classifier or feature calculation engine. Thetraining may be supervised or unsupervised.

Exemplary techniques include but are not limited to tree techniques (forexample binary trees), regression techniques, Hidden Markov Models,Neural Networks, and meta-techniques such as boosting or bagging. Inspecific embodiments, this statistical model is created in accordancewith previously collected “training” data. In some embodiments, ascoring system is created. In some embodiments, a voting model forcombining more than one technique is used.

Appropriate statistical techniques are well known in the art, and aredescribed in a large number of well known sources including, forexample, Data Mining: Practical Machine Learning Tools and Techniqueswith Java Implementations by lan H. Witten, Eibe Frank; Morgan Kaufmann,October 1999), the entirety of which is herein incorporated byreference.

It is noted that in exemplary embodiments a first feature may bedetermined in accordance with a different feature, thus facilitating‘feature combining.’

In some embodiments, one or more feature extractors or calculationengine may be operative to effect one or more ‘classificationoperations’—e.g. determining a gender of a speaker, age range,ethnicity, income, and many other possible classification operations.

Each element described in FIG. 8 is described in further detail below.

Text Feature Extractor(s) 210

FIG. 9 provides a block diagram of exemplary text feature extractors.Thus, certain phrases or expressions spoken by a participant in aconversation may be identified by a phrase detector 260.

In one example, when a speaker uses a certain phrase, this may indicatea current desire or preference. For example, if a speaker says “I amquite hungry” this may indicate that a food product add should be sentto the speaker.

In another example, a speaker may use certain idioms that indicategeneral desire or preference rather than a desire at a specific moment.For example, a speaker may make a general statement regarding apreference for American cars, or a professing love for his children, ora distaste for a certain sport or activity. These phrases may bedetected and stored as part of a speaker profile, for example, inhistorical data storage 142.

The speaker profile built from detecting these phrases, and optionallyperforming statistical analysis, may be useful for present or futureprovisioning of ads to the speaker or to another person associated withthe speaker.

The phrase detector 260 may include, for example, a database ofpre-determined words or phrases or regular expressions.

In one example, it is recognized that the computational cost associatedwith analyzing text to determine the appearance of certain regularphrases (i.e. from a pre-determined set) may increase with the size ofthe set of phrases.

Thus, the exact set of phrases may be determined by various businessconsiderations. In one example, certain sponsors may ‘purchase’ theright to include certain phrases relevant for the sponsor's product inthe set of words or regular expressions.

In another example, the text feature extractor(s) 210 may be used toprovide a demographic profile of a given speaker. For example, usage ofcertain phrases may be indicative of an ethnic group of a nationalorigin of a given speaker. As will be described below, this may bedetermined using some sort of statistical model, or some sort ofheuristics, or some sort of scoring system.

In some embodiments, it may be useful to analyze frequencies of words(or word combinations) in a given segment of conversation using alanguage model engine 256.

For example, it is recognized that more educated people tend to use adifferent set of vocabulary in their speech than less educated people.Thus, it is possible to prepare pre-determined conversation ‘trainingsets’ of more educated people and conversation ‘training sets’ of lesseducated people. For each training set, frequencies of various words maybe computed. For each pre-determined conversation ‘training set,’ alanguage model of word (or word combination) frequencies may beconstructed.

According to this example, when a segment of conversation is analyzed,it is possible (i.e. for a given speaker or speakers) to compare thefrequencies of word usage in the analyzed segment of conversation, andto determine if the frequency table more closely matches the trainingset of more educated people or less educated people, in order to obtaindemographic data (i.e.

This principle could be applied using pre-determined ‘training sets’ fornative English speakers vs. non-native English speakers, training setsfor different ethnic groups, and training sets for people from differentregions. This principle may also be used for different conversation‘types.’ For example, conversations related to computer technologieswould tend to provide an elevated frequency for one set of words,romantic conversations would tend to provide an elevated frequency foranother set of words, etc. Thus, for different conversation types, orconversation topics, various training sets can be prepared. For a givensegment of analyzed conversation, word frequencies (or word combinationfrequencies) can then be compared with the frequencies of one or moretraining sets.

The same principle described for word frequencies can also be applied tosentence structures—i.e. certain pre-determined demographic groups orconversation type may be associated with certain sentence structures.Thus, in some embodiments, a part of speech (POS) tagger 264 isprovided.

A Discussion of FIGS. 10-15

FIG. 10 provides a block diagram of an exemplary system 220 fordetecting one or more speech delivery features. This includes an accentdetector 302, tone detector 306, speech tempo detector 310, and speechvolume detector 314 (i.e. for detecting loudness or softness.

As with any feature detector or computation engine disclosed herein,speech delivery feature extractor 220 or any component thereof may bepre-trained with ‘training data’ from a training set.

FIG. 11 provides a block diagram of an exemplary system 230 fordetecting speaker appearance features—i.e. for video media content forthe case where the multi-party conversation includes both voice andvideo. This includes a body gestures feature extractor(s) 352, andphysical appearance features extractor 356.

FIG. 12 provides a block diagram of an exemplary background featureextractor(s) 250. This includes (i) audio background features extractor402 for extracting various features of background sounds or noiseincluding but not limited to specific sounds or noises such as petsounds, an indication of background talking, an ambient noise level, astability of an ambient noise level, etc; and (ii) visual backgroundfeatures extractor 406 which may, for example, identify certain items orfeatures in the room, for example, certain products are brands presentin a room.

FIG. 13 provides a block diagram of additional feature extractors 118for determining one or more features of the electronic media content ofthe conversations. Certain features may be ‘combined features’ or‘derived features’ derived from one or more other features.

This includes a conversation harmony level classifier (for example,determining if a conversation is friendly or unfriendly and to whatextent) 452, a deviation feature calculation engine 456, a featureengine for demographic feature(s) 460, a feature engine forphysiological status 464, a feature engine for conversation participantsrelation status 468 (for example, family members, business partners,friends, lovers, spouses, etc), conversation expected length classifier472 (i.e. if the end of the conversation is expected within a ‘short’period of time, the information may be carried out differently than forthe situation where the end of the conversation is not expected within ashort period of time), conversation topic classifier 476, etc.

FIG. 14 provides a block diagram of exemplary demographic featurecalculators or classifiers. This includes gender classifier 502, ethnicgroup classifier 506, income level classifier 510, age classifier 514,national/regional origin classifier 518, tastes (for example, clothesand good) classifier 522, educational level classifier 5267, maritalstatus classifier 530, job status classifier 534 (i.e. employed vs.unemployed, manager vs. employee, etc), religion classifier 538 (i.e.Jewish, Christian, Hindu, Muslim, etc).

In one example related to retrieval and/or presentation of informationin accordance with a demographic profile and related to religionclassifier 538, a religion of a person is detected, for example, usingkey-words, accent and/or speaker location. One example relates to aspeaker with who often speaks about Jewish topics, or may often listento Klezmer music or Yiddish music in the background. In one particularexample, if the speaker is discussing a desire to cook dinner with afriend, certain recipes may be presented to the speaker—if the speakeris Jewish, recipes that include pork may be filtered out.

In another example, if the Jewish speaker is speaking with a friendabout the need to find a spouse, personal ads (i.e. from a dating site)may be biased towards people who indicate an interest in Judiasm.

In the description and claims of the present application, each of theverbs, “comprise” “include” and “have”, and conjugates thereof are usedto indicate that the object or objects of the verb are not necessarily acomplete listing of members, components, elements or parts of thesubject or subjects of the verb.

All references cited herein are incorporated by reference in theirentirety. Citation of a reference does not constitute an admission thatthe reference is prior art.

The articles “a” and “an” are used herein to refer to one or to morethan one (i.e., to at least one) of the grammatical object of thearticle. By way of example, “an element” means one element or more thanone element.

The term “including” is used herein to mean, and is used interchangeablywith, the phrase “including but not limited” to.

The term “or” is used herein to mean, and is used interchangeably with,the term “and/or,” unless context clearly indicates otherwise.

The term “such as” is used herein to mean, and is used interchangeably,with the phrase “such as but not limited to”.

The present invention has been described using detailed descriptions ofembodiments thereof that are provided by way of example and are notintended to limit the scope of the invention. The described embodimentscomprise different features, not all of which are required in allembodiments of the invention. Some embodiments of the present inventionutilize only some of the features or possible combinations of thefeatures. Variations of embodiments of the present invention that aredescribed and embodiments of the present invention comprising differentcombinations of features noted in the described embodiments will occurto persons of the art.

1) A method of providing information-retrieval services, the method comprising: a) monitoring a multi-party voice conversation not directed at the entity doing the monitoring; and b) in accordance with content of said monitored voice conversation, retrieving and presenting information to at least one party of said multi-party voice conversation. 2) The method of claim 1 wherein said retrieving includes retrieving at least one of: i) a social-network profile; ii) a weather forecast; iii) a traffic forecast; iv) a Wikipedia entry; v) a news article; vi) an online forum entry; vii) a blog entry; viii) a social bookmarking web service entry; ix) a music clip; and x) a film clip. 3) The method of claim 1 wherein said includes assigning a keyword weight in accordance with an demographic parameter of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation, said estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter. 4) The method of claim 1 wherein said retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with an demographic parameter of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation, said estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter. 5) The method of claim 1 wherein said retrieving includes effecting a disambiguation in accordance with an demographic parameter of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation, said estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter. 6) The method of claim 1 wherein said includes assigning a keyword weight in accordance with a speech delivery feature of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation said speech delivery feature being selected from the group consisting of: i) a loudness parameter; ii) a speech tempo parameter; and iii) an emotional outburst parameter. 7) The method of claim 1 wherein said retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with a geographic location of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation. 8) The method of claim 1 wherein said retrieving includes selecting or emphasizing an information-source from a plurality of candidate information-sources in accordance with an accent feature of at least one given party of said multi-party voice conversation. 9) The method of claim 1 wherein said retrieving includes assigning a keyword weight in accordance with an demographic parameter of a given party of said multi-party voice conversation estimated from electronic media of said multi-party conversation, said estimated demographic parameter being selected from the group consisting of: i) an age parameter; ii) a gender parameter; and iii) an ethnicity parameter. 10) The method of claim 1 wherein said information-presenting for a first set of words extracted from said multi-party conversation includes displacing earlier-presented retrieved information associated with a second earlier set of words extracted from said multi-party conversation in accordance with relative speech delivery parameters of said first and second set extracted words in accordance with a speech delivery feature being selected from the group consisting of: i) a loudness parameter; ii) a speech tempo parameter; and iii) an emotional outburst parameter. 11) The method of claim 1 wherein said multi-party voice conversation is carried out between a plurality of client terminal devices communicating via a wide-area network, and for a given client device of said client device plurality: i) said information retrieval is carried out for incoming content relative to said given client device; and ii) said information presenting is on a display screen of said given client device. 12) A method of providing information-retrieval services, the method comprising: a) monitoring a terminal device for incoming media content and outgoing media content of a multi-party conversation; and b) in accordance with said incoming media content, retrieving information over a remote network and presenting said retrieved information on said monitored-terminal device. 13) The method of claim 1 wherein said retrieving includes sending content of said multi-party conversation to an Internet search engine, and said presenting includes presenting search results from said Internet search engine. 14) The method of claim 12 wherein said retrieving includes retrieving at least one of: i) a social-network profile; ii) a weather forecast; iii) a traffic forecast; iv) a Wikipedia entry; v) a news article; vi) an online forum entry; vii) a blog entry; viii) a social bookmarking web service entry; ix) a music clip; and x) a film clip. 15) A method of providing information-retrieval services, the method comprising: a) monitoring a given terminal client device for an incoming or outgoing remote call; and b) upon detecting a said incoming or outgoing remote call, sending content of said detected incoming call or outgoing call over a wide-area network to a search engine; and c) presenting search results from said search engine on said monitored terminal device. 